HPC Computation on Hadoop Storage with PLFS

نویسندگان

Chuck Cranor

Milo Polte

Garth Gibson

چکیده

In this report we describe how we adapted the Parallel Log Structured Filesystem (PLFS) to enable HPC applications to be able read and write data from the HDFS cloud storage subsystem. Our enhanced version of PLFS provides HPC applications with the ability to concurrently write from multiple compute nodes into a single file stored in HDFS, thus allowing HPC applications to checkpoint. Our results show that HDFS combined with our PLFS HDFS I/O Store module is able to handle a concurrent write checkpoint workload generated by a benchmark with good performance. Acknowledgements: The work in this paper is based on research supported in part by the Los Alamos National Laboratory, under subcontract number 54515 and 153593 (IRHPIT), by the Department of Energy, under award number DE-FC02-06ER25767 (PDSI), by the National Science Foundation, under award number CNS-1042543 (PRObE), and by the Qatar National Research Fund, under award number NPRP 09-1116-1-172 (Qloud). We also thank the members and companies of the PDL Consortium (including Actifio, APC, EMC, Emulex, Facebook, Fusion-IO, Google, Hewlett-Packard, Hitachi, Huawei, IBM, Intel, LSI, Microsoft, NEC, NetApp, Oracle, Panasas, Riverbed, Samsung, Seagate, STEC, Symantec, VMware, Western Digital) for their interest, insights, feedback, and support.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pilot-Abstraction: A Valid Abstraction for Data-Intensive Applications on HPC, Hadoop and Cloud Infrastructures?

HPC environments have traditionally been designed to meet the compute demand of scientific applications and data has only been a second order concern. With science moving toward data-driven discoveries relying more and more on correlations in data to form scientific hypotheses, the limitations of existing HPC approaches become apparent: Architectural paradigms such as the separation of storage ...

متن کامل

MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems

Many organizations—including academic, research, commercial institutions—have invested heavily in setting up High Performance Computing (HPC) facilities for running computational science applications. On the other hand, the Apache Hadoop software—after emerging in 2005— has become a popular, reliable, and scalable open-source framework for processing large-scale data (Big Data). Realizing the i...

متن کامل

myHadoop - Hadoop-on-Demand on Traditional HPC Resources

Traditional High Performance Computing (HPC) resources, such as those available on the TeraGrid, support batch job submissions using Distributed Resource Management Systems (DRMS) like TORQUE or the Sun Grid Engine (SGE). For large-scale data intensive computing, programming paradigms such as MapReduce are becoming popular. A growing number of codes in scientific domains such as Bioinformatics ...

متن کامل

Storage Support for Data-Intensive Applications on Extreme-Scale HPC Systems

Many believe that current high-performance computing (HPC) storage systems would not meet the I/O requirement of the emerging exascale computing because of the segregation of compute and storage resources. Indeed, our simulation predicts, quantitatively, that the system availability would go towards zero at exascale. This work proposes a storage architecture with node-local disks for HPC system...

متن کامل

A hardware and software computational platform for HiPerDNO (High Performance Distribution Network Operation) project

The HiPerDNO project aims to develop new applications to enhance the operational capabilities of Distribution Network Operators (DNO). Their delivery requires an advanced computational strategy. This paper describes a High Performance Computing (HPC) platform developed for these applications whilst also being flexible enough to accommodate new ones emerging from the gradual introduction of smar...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

HPC Computation on Hadoop Storage with PLFS

نویسندگان

چکیده

منابع مشابه

Pilot-Abstraction: A Valid Abstraction for Data-Intensive Applications on HPC, Hadoop and Cloud Infrastructures?

MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems

myHadoop - Hadoop-on-Demand on Traditional HPC Resources

Storage Support for Data-Intensive Applications on Extreme-Scale HPC Systems

A hardware and software computational platform for HiPerDNO (High Performance Distribution Network Operation) project

عنوان ژورنال:

اشتراک گذاری